Goto

Collaborating Authors

 quadratic term


QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks

Chen, Qian, Yang, Linxin, Wang, Akang, Luo, Xiaodong, Zhang, Yin

arXiv.org Artificial Intelligence

The combination of linear transformations and non-linear activation functions forms the foundation of most modern deep neural networks, enabling them to approximate highly complex functions. This paper explores the introduction of quadratic transformations to further increase nonlinearity in neural networks, with the aim of enhancing the performance of existing architectures. To reduce parameter complexity and computational complexity, we propose a lightweight quadratic enhancer that uses low-rankness, weight sharing, and sparsification techniques. For a fixed architecture, the proposed approach introduces quadratic interactions between features at every layer, while only adding negligible amounts of additional model parameters and forward computations. We conduct a set of proof-of-concept experiments for the proposed method across three tasks: image classification, text classification, and fine-tuning large-language models. In all tasks, the proposed approach demonstrates clear and substantial performance gains.


Bregman Alternating Direction Method of Multipliers

Huahua Wang, Arindam Banerjee

Neural Information Processing Systems

The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O (1 /T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O (n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.



A quantum annealing approach to graph node embedding

Djidjev, Hristo N.

arXiv.org Artificial Intelligence

Node embedding is a key technique for representing graph nodes as vectors while preserving structural and relational properties, which enables machine learning tasks like feature extraction, clustering, and classification. While classical methods such as DeepWalk, node2vec, and graph convolutional networks learn node embeddings by capturing structural and relational patterns in graphs, they often require significant computational resources and struggle with scalability on large graphs. Quantum computing provides a promising alternative for graph-based learning by leveraging quantum effects and introducing novel optimization approaches. Variational quantum circuits and quantum kernel methods have been explored for embedding tasks, but their scalability remains limited due to the constraints of noisy intermediate-scale quantum (NISQ) hardware. In this paper, we investigate quantum annealing (QA) as an alternative approach that mitigates key challenges associated with quantum gate-based models. We propose several formulations of the node embedding problem as a quadratic unconstrained binary optimization (QUBO) instance, making it compatible with current quantum annealers such as those developed by D-Wave. We implement our algorithms on a D-Wave quantum annealer and evaluate their performance on graphs with up to 100 nodes and embedding dimensions of up to 5. Our findings indicate that QA is a viable approach for graph-based learning, providing a scalable and efficient alternative to previous quantum embedding techniques.


Nearly Optimal Differentially Private ReLU Regression

Ding, Meng, Lei, Mingxi, Wang, Shaowei, Zheng, Tianhang, Wang, Di, Xu, Jinhui

arXiv.org Machine Learning

In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\Tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(\epsilon, \delta)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $\epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, \delta)$-DP algorithms is lower bounded by $\Omega(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results.


Bregman Alternating Direction Method of Multipliers

Huahua Wang, Arindam Banerjee

Neural Information Processing Systems

The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O(1/T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O(n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Overview: This paper studies the benefits of augmenting the linear programming relaxation of the maximum a-posteriori (MAP) inference problem in graphical models with a quadratic term, thereby achieving strong convexity. Such augmented formulations are obtained both from the original primal and dual formulations, and in each case the resulting primal-dual relationship is studied. Prior work has mostly focused on smoothing the LP formulation using a softmax/entropy term, with a few notable exceptions, such as [5], [17] and [18]. Rather than those previous approaches, which employ a quadratic term in the sub-problems of either a *proximal* or a *alternating direction* scheme, in the present manuscript, the quadratic smoothing term is added directly. This can in some way be seen as a naive approach: In comparison to proximal or alternating direction schemes, convergence to the global optimum of the original problem is no longer guaranteed, and the approximation quality directly depends on the strength of the augmentation term.


QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Xu, Chenhui, Wang, Xinyao, Yu, Fuxun, Xiong, Jinjun, Chen, Xiang

arXiv.org Artificial Intelligence

Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.


Bregman Alternating Direction Method of Multipliers

Neural Information Processing Systems

The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O(1/T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O(n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.


Energy-Preserving Reduced Operator Inference for Efficient Design and Control

Koike, Tomoki, Qian, Elizabeth

arXiv.org Artificial Intelligence

Many-query computations, in which a computational model for an engineering system must be evaluated many times, are crucial in design and control. For systems governed by partial differential equations (PDEs), typical high-fidelity numerical models are high-dimensional and too computationally expensive for the many-query setting. Thus, efficient surrogate models are required to enable low-cost computations in design and control. This work presents a physics-preserving reduced model learning approach that targets PDEs whose quadratic operators preserve energy, such as those arising in governing equations in many fluids problems. The approach is based on the Operator Inference method, which fits reduced model operators to state snapshot and time derivative data in a least-squares sense. However, Operator Inference does not generally learn a reduced quadratic operator with the energy-preserving property of the original PDE. Thus, we propose a new energy-preserving Operator Inference (EP-OpInf) approach, which imposes this structure on the learned reduced model via constrained optimization. Numerical results using the viscous Burgers' and Kuramoto-Sivashinksy equation (KSE) demonstrate that EP-OpInf learns efficient and accurate reduced models that retain this energy-preserving structure.